昨天介紹了OpenTelemetry的名詞跟概念.
今天來架設其中一款支持OpenTelemetry的追蹤系統
Jaeger是CNCF項目之一, 受到Dapper和OpenZipkin的起發.
由Uber開源的分布式追蹤系統, 用來監控和診斷鏈路分布式系統.
Uber也在自己的Blog上發表了一篇文章Evolving Distributed Tracing, 講解了Uber在分布式追蹤從一開始到Jaeger的誕生.
Jaeger的服務架構如下圖
其實就是Jaeger client的Library, 有對OpenTelemetry和OpenTracing進行了實現.
也是Sidecar模式的實現, 負責把client透過UDP發出的spans給批量推送到Collector上.
主要是為了屏蔽client對於collector的路由實做細節.
收集Spans, 把Span經過驗證、轉換、索引並且寫入DB內.
Colletcor能設定Sampling採樣邏輯, 根據Sampling的設定進行收集和處理.
因為這組件是無狀態的, 所以可以建立很多個Collector加速寫入到DB.
DB支援了Cassandra、Elasticsearch、Kafka.
官方建議是用Cassandra, 原因有2. Cassanda是一個K-V資料庫, 對於用TracdID來搜尋的場景效率很高. 且寫入吞吐量相當好.
但若是為了分析查詢, 還是Elasticsearch實在.
接收查詢請求, 然後從DB中檢索, 並透過UI展示.
Jaeger Query也是無狀態的, 所以可以啟動多個實例.
安裝Jarger-Collector、Jarger-Agent、Jaeger-Query.
和Elastic Cluster(Master+Node)
version: "3.6"
services:
    jaeger-collector:
        image: jaegertracing/jaeger-collector
        command: 
            - --es.num-shards=2
            - --es.num-replicas=0
            - --es.server-urls=http://172.16.230.100:9200,http://172.16.230.102:9201
            - --collector.zipkin.host-port=:9411
        ports: 
            - "14269"
            - "14268:14268"
            - "14250"
            - "9411:9411"
        environment:
            - SPAN_STORAGE_TYPE=elasticsearch
            - LOG_LEVEL=debug
        networks:
            jaeger_net:
                ipv4_address: 172.16.230.2
        depends_on:
            - elasticsearch-master
    jaeger-query:
        image: jaegertracing/jaeger-query
        command: 
            - --es.num-shards=2
            - --es.num-replicas=0
            - --es.server-urls=http://172.16.230.100:9200,http://172.16.230.101:9201
        ports:
            - "16686:16686"
            - "16687"
        environment: 
            - SPAN_STORAGE_TYPE=elasticsearch
            - LOG_LEVEL=debug
        networks:
            jaeger_net:
                ipv4_address: 172.16.230.3
        depends_on:
            - elasticsearch-master
    jaeger-agent:
        image: jaegertracing/jaeger-agent
        command: 
            - --reporter.grpc.host-port=jaeger-collector:14250
            - --reporter.grpc.retry.max=1000
        ports:
            - "5775:5775/udp"
            - "6831:6831/udp"
            - "6832:6832/udp"
            - "5778:5778"
        environment: 
            - LOG_LEVEL=debug
        networks:
            jaeger_net:
                ipv4_address: 172.16.230.4
        depends_on: 
            - jaeger-collector
    elasticsearch-master:
        container_name: es-master01
        hostname: es-master01
        image: elasticsearch:7.1.1
        volumes:
            - ./elasticsearch/master/conf/es-master.yml:/usr/share/elasticsearch/config/elasticsearch.yml
            - ./elasticsearch/master/data:/usr/share/elasticsearch/data
            - ./elasticsearch/master/logs:/usr/share/elasticsearch/logs
        environment:
            - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
        ports:
            - 9200:9200
            - 9300:9300
        expose: 
            - 9200
        networks:
            jaeger_net:
                ipv4_address: 172.16.230.100
    elasticsearch-slave1:
        container_name: es-slave01
        hostname: es-slave01
        image: elasticsearch:7.1.1
        volumes:
            - ./elasticsearch/slave1/conf/es-slave1.yml:/usr/share/elasticsearch/config/elasticsearch.yml
            - ./elasticsearch/slave1/data:/usr/share/elasticsearch/data
            - ./elasticsearch/slave1/logs:/usr/share/elasticsearch/logs
        environment:
            - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
        ports:
            - 9100:9100
            - 9201:9201
        expose: 
            - 9201
        networks:
            jaeger_net:
                ipv4_address: 172.16.230.101
networks:
    jaeger_net:
        driver: bridge
        ipam:
            driver: default
            config:
            -
                subnet: 172.16.230.0/24
es-master.yml
cluster.name: es-cluster
node.name: es-master
node.master: true
node.data: true
network.host: 0.0.0.0
http.port: 9200
transport.port: 9300
discovery.seed_hosts:
  - 172.16.230.100
  - 172.16.230.101
cluster.initial_master_nodes:
  - es-master
http.cors.enabled: true
http.cors.allow-origin: "*"
xpack.security.enabled: false
es-slave1.yml
cluster.name: es-cluster
node.name: es-slave1
node.master: false
node.data: true
network.host: 0.0.0.0
http.port: 9201
discovery.seed_hosts:
  - 172.16.230.100
  - 172.16.230.101
cluster.initial_master_nodes:
  - 172.16.230.100
http.cors.enabled: true
http.cors.allow-origin: "*"
xpack.security.enabled: false
上面的一些command設定, 能參考這裡CLI flags
其實能設定的部份不多.
接著就能打開瀏覽器, 輸入http://172.16.230.3:16686/
搭配官方範例的example來試試看.
這裡我是打給Agen, Agent在傳送給Collector.
當然也能直接打給Collector就是了.
要看架構跟吞吐量.
package main
import (
	"context"
	"log"
	"go.opentelemetry.io/otel/api/global"
	"go.opentelemetry.io/otel/label"
	"go.opentelemetry.io/otel/exporters/trace/jaeger"
	sdktrace "go.opentelemetry.io/otel/sdk/trace"
)
// initTracer creates a new trace provider instance and registers it as global trace provider.
func initTracer() func() {
	// Create and install Jaeger export pipeline
	flush, err := jaeger.InstallNewPipeline(
		jaeger.WithAgentEndpoint("172.16.230.4:6831"),
		// jaeger.WithCollectorEndpoint("http://localhost:14268/api/traces"),
		jaeger.WithProcess(jaeger.Process{
			ServiceName: "trace-demo",
			Tags: []label.KeyValue{
				label.String("exporter", "jaeger"),
				label.Float64("float", 312.23),
			},
		}),
		jaeger.WithSDK(&sdktrace.Config{DefaultSampler: sdktrace.AlwaysSample()}),
	)
	if err != nil {
		log.Fatal(err)
	}
	return func() {
		flush()
	}
}
func main() {
	fn := initTracer()
	defer fn()
	ctx := context.Background()
	tr := global.Tracer("component-main")
	ctx, span := tr.Start(ctx, "foo")
	bar(ctx)
	span.End()
}
func bar(ctx context.Context) {
	tr := global.Tracer("component-bar")
	_, span := tr.Start(ctx, "bar")
	defer span.End()
	// Do bar...
}
在JaegerUI上選擇trace-demo, 按下Find Traces.
就會看到Traces了. 點進去就會看到.
foo就是Parent Span,
bar則是Sub Span.
這樣基本的就完成了環境建置.
剩下的明天再來看看.